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IV 


1  SUMMARY 


The  objective  of  this  project  was  to  conduct  research  that  will  advance  the  wideband 
autonomous  cognitive  radio  (WACR)  technology.  These  are  radios  that  have  the  ability  to  sense 
state  of  the  radio  frequency  (RF)  spectrum  and  the  network  and  self-optimize  their  operating 
modes  in  response  to  this  sensed  state.  First,  this  project  developed  a  formal  framework  for  robust 
spectrum  knowledge  acquisition  in  a  wideband  autonomous  cognitive  radio.  This  framework  was 
implemented  on  a  simulation  scenario  to  evaluate  its  performance.  An  important  functionality  in 
a  WACR  is  the  sub-band  selection  which  allows  the  radio  to  operate  over  a  wide  spectrum  range 
with  real-time  awareness  of  the  spectrum  state.  Thus,  a  machine  learning  based  sub-band  selection 
algorithm  for  WACRs  was  developed  based  on  reinforcement  learning  and  its  performance  was 
analyzed  through  a  combination  of  analysis  and  simulations.  Finally,  motivated  by  certain 
application  scenarios  of  interest,  a  completely  new  definition  for  the  state  of  the  spectrum  of 
interest  to  a  WACR  was  developed.  Currently,  this  new  approach  is  being  used  for  developing 
practical  cognitive  communications  protocols  with  considerably  less  computational  complexity 
than  previous  alternatives.  Future  work  will  include  design,  implementation  and  analysis  of 
cognitive  communications  protocols  suited  for  space  and  satellite  communications  using  this  new 
state  definition. 


2  INTRODUCTION 

Wideband  Autonomous  Cognitive  Radios  (WACRs)  proposed  in  [1]  and  investigated  in 
this  project  present  a  potential  future  technology  to  realize  autonomous  radio  communications  over 
non-contiguous  wide  spectrum  bands  in  the  presence  of  adverse  conditions.  These  adverse 
conditions  may  be  both  deliberate  as  well  as  inadvertent.  Moreover,  encroachment  on  previously- 
allocated  spectrum  resources  by  commercial  and  unlicensed/unauthorized  users  can  only  be 
expected  to  grow  in  the  coming  years.  Combination  of  these  traditional  as  well  as  evolving 
spectrum  demands  requires  future  telecommunications  technologies  to  be  intelligent,  self-aware 
and  spectrally  agile.  Wideband  autonomous  cognitive  radios  (WACRs)  pursued  in  this  project  are 
radios  with  these  defining  characteristics  that  can  lead  to  autonomous  radio  communications  over 
non-contiguous  wide  spectrum  bands  in  the  presence  of  adverse  conditions  [1,2]. 
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Spectrum  awareness  is  the  most  salient  feature  of  cognitive  radios  that  makes  them 
cognitive  and  spectrum  sensing  is  the  process  of  acquiring  spectrum  awareness  [1,3,4].  In  the  case 
of  wideband  autonomous  cognitive  radios,  spectrum  sensing  is  usually  performed  over  several 
non-contiguous  spectrum  bands  each  spanning  hundreds  of  MHz  to  even  on  the  order  of  a  GHz 
[1,2, 5-7].  Due  to  wide  bandwidth  and  noncontiguous  nature  of  the  frequency  range  of  interest,  the 
spectrum  knowledge  acquisition  problem  posed  by  WACR  is  significantly  more  challenging  than 
simple  spectrum  sensing  performed  by  a  dynamic  spectrum  sharing  (DSS)  cognitive  radio.  In 
particular,  a  wideband  cognitive  radio  is  expected  to  identify  all  spectral  activities  present  in  the 
spectrum  band  of  interest  [1].  These  signals  can  be  of  different  types  and  be  located  at  unknown 
carrier  frequencies  spread  over  a  wide  spectrum  range.  Noise,  interference  and  propagation 
properties  can,  however,  vary  significantly  over  the  spectrum  of  interest  to  a  wideband  autonomous 
cognitive  radio  rendering  many  usual  assumptions  made  in  conventional  signal  processing  invalid. 

The  spectrum  knowledge  acquisition  problem  posed  by  an  autonomous  wideband  cognitive 
radio  can  be  divided  in  to  three  steps  as  shown  on  Fig.  1  [1]: 

1 .  Wideband  spectrum  scanning  problem:  How  to  efficiently  and  effectively  scan  a 
wide  spectrum  range  in  real-time,  or  near  real-time. 

2.  Spectral  activity  detection  problem:  How  to  detect  the  active  signals  in  the  sensed 
spectrum. 

3.  Signal  identification  problem:  How  to  classify  and  identify  the  origins  of  the 
detected  signals. 


Planning 


Processing 


Figure  1.  Spectrum  knowledge  acquisition  consists  of  a  planning  stage  and  a  processing 

stage. 
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In  order  for  a  WACR  to  detect  spectrum  opportunities  it  must  be  able  to  observe  and 
interpret  its  surrounding  RF  environment.  The  first  stage  of  this  process,  as  shown  in  Fig.  1,  is  the 
wideband  spectrum  scanning  [1,8].  Hardware  constraints  limit  the  instantaneous  sensing 
bandwidth  of  most  state-of-the-art  software-defined  radio  (SDR)  platforms.  Hence,  the  challenge 
in  this  step  is  to  design  an  efficient  scheme  to  achieve  real-time  sensing  over  a  wide  spectrum 
range.  In  order  to  be  able  to  scan  a  wide  spectrum  band  in  real-time,  the  spectrum  of  interest  is 
first  divided  into  a  set  of  sub-bands.  Each  sub-band  can  be  wide  enough  to  contain  multiple 
communication  channels.  Since  the  WACR  can  sense  only  one  sub-band  at  any  given  time,  it  needs 
to  detennine  which  one  to  be  sensed  at  each  time  instant.  This  problem  is  known  as  sub-band 
selection  problem  in  wideband  spectrum  sensing  [1]. 

After  scanning  a  spectrum  band  of  interest,  the  second  step  is  to  detect  any  spectrum 
activity  present  in  a  sensed  spectrum  sub-band.  We  must  emphasize  that  the  objective  of  a  WACR 
is  to  detect  and  identify  all  spectral  activities  present  in  the  spectrum  bands  of  interest  to  the 
cognitive  radio,  not  just  detecting  whitespace  in  spectrum.  Thus,  a  third  step  of  signal  classification 
and  identification  may  be  needed  after  signal  detection  [1,7,9]. 

The  long-tenn  objective  of  this  project  is  to  systematically  develop  a  comprehensive 
wideband  spectrum  knowledge  acquisition  framework  over  the  frequencies  of  interest  to  satellite 
and  space  communications.  This  will  advance  the  proposed  WACR  technology  leading  to  future 
space  and  satellite  radio  systems  that  can  be  autonomous,  self-aware  and  intelligent. 

During  the  last  performance  period,  we  have  specifically  been  focused  on  the  following 
aspects  of  this  larger  project: 

•  Develop  and  optimize  a  compressive  sampling-based  robust  spectrum  estimation 
algorithm  suitable  for  a  WACR. 

•  Developing  a  machine  learning  aided  sub-band  selection  algorithm  for  a  WACR 
based  on  reinforcement  learning  paradigm. 

•  Developing  a  new  mathematical  definition  of  the  state  of  the  spectrum  specifically 
suited  for  the  WACR  context  and  using  it  to  design  new  machine  learning  aided 
sub-band  selection  algorithms. 

Continuing  from  our  previous  work  [10],  we  have  been  attempting  to  optimize  our 
compressive  sampling  based  robust  spectrum  estimation  approach.  Note  that,  compressive 
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sampling  is  used  to  reduce  the  high  sampling  rate  requirements  demanded  by  wideband  spectrum 
sensing  [1,11].  When  there  is  a  certain  amount  of  sparsity  in  the  signal  with  respect  to  some  basis, 
compressive  sampling  can  be  an  efficient  technique  for  reconstructing  a  signal  that  is  sparse  with 
respect  to  some  basis  (i.e.  most  of  the  expansion  coefficients  of  the  signal  are  zero  with  respect  to 
a  certain  basis)  [12,13].  The  efficiency  afforded  by  compressive  sampling  is  two-fold:  First,  a 
smaller  number  of  samples,  compared  to  what  is  needed  with  traditional  Shannon-Nyquist 
sampling,  will  suffice.  Second,  the  reconstruction  of  the  signal  from  this  reduced  number  of 
samples  can  be  achieved  with  an  algorithm  with  low  computationally  complexity  [1], 

The  suitability  of  a  compressive  sampling-based  front-end  for  a  cognitive  radio  hinges  on 
the  fact  that  in  many  situations  sensed  sub-bands  will  have  low  spectrum  utilization  making  them 
sparse  with  respect  to  a  frequency-domain  basis.  This  is  indeed  usually  the  case  in  many  spectrum 
ranges.  Even  in  situations  in  which  a  few  hundreds  of  MHz  of  spectrum  may  be  highly  utilized, 
the  frequency-domain  sparsity  may  be  applicable  based  on  the  instantaneous  bandwidth  of  the 
spectrum  sensing  front-end  and  the  chosen  sub-band  bandwidth  [1].  In  fact,  it  is  the  relatively 
large  sub-band  bandwidth  that  may  necessitate  a  robust  spectral  activity  detector  in  place  of  the 
Gaussian  assumption  [1],  The  robust  spectral  activity  detection  approach  we  have  been  developing 
during  this  project,  based  on  compressive  sampling,  is  aimed  at  providing  robustness  against 
possibly  non-Gaussian  jammers,  interference  and  other  unwanted  electromagnetic  interference 
(EMI)  [1,10]. 

The  sub-band  dynamics  model  proposed  in  [1]  can  reasonably  be  used  to  develop  effective 
sub-band  selection  algorithms.  To  overcome  the  unavailability  of  model  parameters  and  time- 
varying  conditions,  machine  learning  can  be  incorporated  into  such  decision  algorithms.  In  this 
project,  we  have  specifically  been  focused  on  using  reinforcement  learning  algorithms  for  this 
purpose  for  their  suitability  in  Markov  environments  [  1 ,  14],  As  we  will  discuss  in  the  next  section, 
however,  it  may  be  possible  to  develop  new  models  for  spectrum  dynamics  based  on  new 
mathematical  definitions  for  the  spectrum  state  as  needed  in  particular  application  scenarios.  These 
new  models  may  lead  to  learning  and  decision  algorithms  with  considerably  lower  computational 
complexities,  making  them  attractive  in  practice. 
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3  METHODS,  ASSUMPTIONS,  AND  PROCEDURES 


Robust  Wideband  Spectrum  Knowledge  Acquisition: 

Let  us  consider  a  wide  spectrum  band  of  B  Hz  (where  B  can  be  in  the  order  of  hundreds 
of  MHz  to  even  a  GHz)  that  is  first  segmented  into  several  sub-bands  [1].  Note  that  each  sub-band 
may  contain  several  channels  possibly  corresponding  to  different  communications  systems.  As 
illustrated  in  Fig.  2,  let  us  assume  that  there  are  Nb  sub-bands.  In  general,  scanning  of  these  sub¬ 
bands  spanning  several  non-contiguous  frequency  ranges  can  be  achieved  using  reconfigurable 
antennas  [15-23]. 


■* —  Subband  i,  i=l,...,Nb  — ►  -  Subband  Nb  - ► 


Cognitive  radio's  wideband  frequency  of 


interests  (usually  GHz  range) 

a)  at  time  instant  t  =  ti 


Cognitive  radio's  wideband  frequency  of 


interests  (usually  GHz  range) 

b)  at  time  instant  t  =  t2 


Figure  2.  The  spectrum  of  interest  to  a  wideband  cognitive  radio  is  divided  in  to  a  set  of  Nb 
sub-bands.  Each  sub-band  may  contain  possibly  different  types  of  signals  and  the  amount 

of  white-spaces  can  be  time-varying. 
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Let  us  denote  the  A'-lcngth  discrete-time  sub-band  signal  by  y  where  N  is  chosen  to  satisfy 
Nyquist  sampling  criteria.  The  discrete  frequency  domain  representation  y/  of  this  sub-band  signal 
can  then  be  written  as  [1,10,11] 

yf=Cy  (1) 

where  C  is  an  Appoint  Discrete  Fourier  Transform  (DFT)  matrix. 

When  spectrum  utilization  within  a  sub-band  is  low,  as  in  Fig.  1,  we  may  expect  y/  to  be  a 
sparse  signal  with  respect  to  the  frequency  domain  basis.  While  conventional  Shannon-Nyquist 
sampling  theory  does  not  take  in  to  account  such  sparsity  of  signals,  compressive  sampling  allows 
this  sparsity  property  of  a  signal  to  be  exploited  to  detect  spectral  activities  in  each  sub-band  with 
a  reduced  number  of  samples.  Indeed,  suppose  that  the  sensed  signal  within  the  sub-band  of  interest 
is  sparse  and  that  we  only  collect  an  M  (where  M  <  N)  number  of  randomly  selected  samples  from 
the  signal  y: 

yc=Oy  =  OCHyf=Cy  (2) 

where  O  is  an  M  x  N  random  sampling  matrix  and  yc  is  an  M-length  observation  vector.  As  has 
been  shown  in  [12,13],  if  the  sampled  signal  is  sparse  then  the  signal  y  (or  yj)  can  indeed  be 
reconstructed  from  the  randomly  compressive  sampled  (under-sampled)  version  yc  of  y.  In  the 
presence  of  noise  (2)  becomes 

yc  =  O y  +  w  (3) 

where  w  is  an  M-length  arbitrary  noise  vector. 

Our  previously  proposed  compressive  sampling-based  robust  spectrum  estimator  allows  us 
to  obtain  better  spectral  estimation  in  the  presence  of  possibly  non-Gaussian  noise  and  interference 
[10].  This  can  improve  spectral  activity  detection  performance  of  the  cognitive  radio  when  indeed 
noise  is  non-Gaussian.  Specifically,  the  proposed  algorithm  constructs  the  frequency-domain  sub¬ 
band  signal  from  yc  by  solving  the  following  optimization  problem  [11]: 

y*f  =  argmin lH(yc  •  (4) 

yf*cN  ■  11/1 

where  y  is  a  smoothing  parameter  that  balances  between  the  //-norm  (sparsity)  and  the  Huber  cost 
function  defined  as 
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X 


2  / 


z/M  >  5h 


(5) 


which  is  known  to  be  optimal  against  £  -  contaminated  Gaussian  noise  [1,24,25]. 

Reinforcement  Learning-based  Sub-band  Selection  for  Wideband  Spectrum  Sensing  in 

Cognitive  Radios: 

Again,  as  shown  in  Fig.  2,  we  assume  that  the  spectrum  of  interest  to  the  WACR  is  divided 
into  Nb  sub-bands  and  that  each  sub-band  may  include  a  different  number  of  communication 
channels.  We  denote  by  the  number  of  channels  in  the  z'th  sub-band.  Let  X  G  {1,2, ... }  denote 
the  set  of  time  slot  indices.  For  simplicity,  we  assume  the  channel  state  to  be  constant  within  a 
single  time  slot.  At  any  given  time,  each  channel  state  can  take  two  possibilities:  Either  occupied 
by  another  radio  system  (busy)  or  available  to  be  used  by  the  WACR  (idle).  We  assume  that  this 
idle/busy  state  of  each  channel  evolves  according  to  a  two-state  (0/1  state),  first  order  Markov 


chain  [1,8,26]. 


Figure  3.  Markov  chain  model  for  the  *-th  sub-band  when  the  state  is  defined  to  be  the 
number  of  idle  channels  in  the  sub-band  [1], 


Sub-band  selection  decisions  by  a  WACR  will  depend  on  its  performance  objective.  In  the 
following,  we  will  assume  that  the  goal  of  the  WACR  is  to  find  the  sub-band  that  has  the  largest 
number  of  idle  channels,  as  proposed  in  [1],  Hence,  we  may  define  a  new  state  St[k]  denoting  the 
number  of  the  idle  channels  in  ith  sub-band  at  time/r,  where  SL[k\  E  {0,1,  ...,MJ  .  If  channel 
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idle/busy  dynamics  were  to  be  Markov,  as  assumed  above,  then  the  dynamics  of  this  new 
state St[k]  will  also  be  Markov  [1].  Figure  3  shows  the  sub-band  Markov  model,  with  plss , 
denoting  the  transition  probability  of  the  ith  sub-band  from  state  s  to  states'  [1],  The  overall 
spectrum  state  at  time  k  can  then  be  defined  as  S[k]  —  {  S± [k],  S2  [k\, ...  ,  SNb  [k]j.  Let  us  denote 
by  S  the  set  of  all  the  possible  states  S[k]  may  take.  Then,  the  set  S  has  Z  possible  states, 
where  Z  —  FI^iC  +  1). 

At  any  given  time,  a  WACR  can  only  observe  a  single  sub-band  out  of  the  total  Nb  sub¬ 
bands  due  to  realistic  hardware  constraints.  Hence,  the  sub-band  selection  problem  can  be 
considered  as  a  decision  making  problem  in  which  the  system  state  can  only  be  observed  partially. 
Since  we  have  assumed  that  the  underlying  system  dynamics  are  Markov,  thus  the  sub-band 
selection  problem  could  be  modeled  as  a  partially  observable  Markov  decision  process  (POMDP) 
[1].  We  may  define  the  selection  process  at  time  k  as  taking  an  action  a[k]  G  A  with  the  action 
space  A  —  {1,2, ...,  Nb}  representing  the  set  of  sub-band  indices.  Let  Y  [/r]  represent  a  partial 
observation  corresponding  to  state  S[k]  and  r(S[k\,  a[k ])  represents  the  immediate  reward  from 
taking  action  a  G  A  when  in  state  S[k]  at  time  k.  We  define  the  reward  to  be  the  number  of  idle 
channels  available  in  the  a-th  sub-band  at  time  k  +  1,  if  action  a  (i.e.  the  sub-band  a)  was  chosen 
when  in  state  S[k]  at  time  k  [1,8,26].  Note  that  action  a[k]  will  be  selected  before  observing  Y  [k] 
corresponding  to  the  current  state  at  time  k.  Instead,  what  is  available  to  the  WACR  is  the  history 
made  of  past  observations,  actions  and  the  associated  rewards  up  to  the  current  time  k  denoted  by 
h[k]. 

Given  all  the  available  information  up  to  time  k,  we  may  define  a  posteriori  probability 
bm[k]  as  our  belief  that  the  current  state  S[k]  is  sm.  The  set  of  all  a  posteriori  probabilities 
corresponding  to  all  possible  states  is  called  the  belief  state  vector  b[k]  — 
[b1[k\,b2[k\,  ...,bz[k]]T,  with  bm  G  [0,1]  form  =  1,  ...,Z  [1].  It  is  a  well-known  result  that  the 
belief  state  vector  is  a  sufficient  statistic  for  optimal  decision  making  in  a  POMDP  [1,27].  Thus, 
when  making  a  decision,  instead  of  taking  into  account  all  the  history  information  h[k],  we  may 
rely  only  on  the  belief  state  b[k ]. 
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Finding  an  optimal  policy  for  the  sub-band  selection  POMDP,  however,  leads  to  many 
challenges.  First,  it  may  require  high  computational  complexity  due  to  the  continuous  state  space 
of  the  belief  state  vector  [1,8,28-30].  Second,  a  policy  needs  to  be  computed  in  real-time. 
Moreover,  we  need  the  knowledge  of  sub-band  Markov  model  parameters  and,  in  particular,  the 
transition  probabilities  of  the  model  to  be  able  to  update  the  belief  state  vector.  In  addition,  these 
model  parameters  may  vary  with  time  due  to  the  dynamic  nature  of  the  wireless  environment. 
These  all  make  any  attempt  to  directly  compute  an  optimal  policy  complicated.  As  an  alternative, 
we  may  use  machine  learning  in  which  a  WACR  may  attempt  to  learn  an  optimal  policy  instead 
of  computing  one  [1].  A  particular  type  of  machine  learning  approach,  called  reinforcement 
learning,  could  especially  be  suited  when  underlying  state  dynamics  are  Markov  as  assumed  in  our 
model  [1,14]. 

Q-learning  is  one  of  the  most  widely  used  reinforcement  learning  approaches  [1,14,31],  in 
which  the  algorithm  maintains  a  Q-table  containing  Q-values  denoted  by  Q  (S,  a)  that  represents  a 
measure  of  goodness  resulting  from  taking  an  action  a  (selecting  the  a- th  sub-band)  when  in  state 
S.  Hence,  if  the  selected  sub-band  contains  a  large  number  of  idle  channels  this  may  lead  to  a  high 
reward  and,  consequently,  a  high  Q-value.  However,  Q-learning  is  not  directly  applicable  to  our 
POMDP  sub-band  selection  problem.  Thus,  in  our  work  we  have  resorted  to  an  extension  of  Q- 
leaming  called  the  replicated  Q-learning  [1,32]. 

The  replicated  Q-learning  algorithm  attempts  to  reinforce  the  actions  that  lead  to  better 
outcomes  from  a  given  state.  Each  time  an  action  is  selected  in  a  given  state,  the  Q-table  is  updated 
as  in  [1,26] 

Q{sm,a[k  -  1  ])<^Q{sm,a[k  -  1])+  abm[k  -  1]| r(sm,a[k  - 1])  +  y  max  Q(b[k\,  a)- Q{sm,a[k  -  l])j  (6) 

Recall  that,  the  reward  r(sm,  a[k ])  represents  the  number  of  idle  channels  available  in  the 
selected  sub-band  and  bm  is  the  m-th  element  of  the  belief  state  vector  b.  We  denote  by  a  G  (0,1) 
the  learning  rate  while  the  parameter  y  G  [0,1)  represents  the  discount  factor.  Future  actions  (sub¬ 
band  selections)  will  then  be  selected  based  on  the  updated  Q-values: 

a*  =  argmax£>(b[k],a)  (7) 

aeA 
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where  Q(b[  k],  a)  is  the  average  of  the  Q-values  when  taking  action  a  from  all  possible  states  given 
the  belief  state  b ,  given  by  Q(b\k\,a)  =  Z 

m 

As  with  any  adaptive  or  learning  algorithm,  Q-learning  may  also  get  trapped  at  a  local 
optimal  leading  to  a  policy  that  may  not  be  the  optimal.  In  order  to  avoid  this  problem  we  may 
define  a  new  parameter  called  exploration  rate  e  G  (0,1).  Depending  on  the  exploration  rate,  the 
CR  can  switch  between  selecting  the  action  characterized  by  (7)  or  just  randomly  selecting  an 
action  out  of  all  possible  actions  [1]: 

t  farg  max  Q(b[k],a)  with  probability  1  -  s 

a  ~\  a£A  /  x  (8) 

;  ~  U(A)  with  probability  £ 

where  U (A)  denotes  the  uniform  distribution  over  the  action  set  A.  Choosing  a  high  exploration 
rate  may  help  in  updating  the  entire  Q-table  and  avoid  being  trapped  in  a  sub-optimal  policy.  On 
the  other  hand,  a  low  exploration  rate  will  help  in  exploiting  the  already  learned  optimal  actions. 
Thus,  obtaining  an  optimal  policy  requires  the  selection  of  an  appropriate  exploration  rate  that 
could  balance  between  the  exploration  and  exploitation  [1,8,14,26]. 


A  New  Spectrum  State  Definition  for  Interference  Avoiding  and  Anti-jamming  WACRs: 

A  common  situation  in  which  cognitive  communications  can  be  a  great  asset  is  when 
reliable  communications  is  needed  in  the  presence  of  either  unintentional  interference  and/or 
deliberate  jammers.  In  this  case,  each  WACR  will  attempt  to  avoid  the  jammer  signals  as  well  as 
the  other  WACRs’  transmission.  In  these  situations,  we  may  reduce  the  computational  complexity 
of  sub-band  selection  algorithms  by  defining  a  binary-valued  state  for  each  sub-band:  Either  the 
sub-band  is  free  of  interference  and  jammers  according  to  a  certain  criterion  (state  1)  or  it  is  not 
(state  0)  [2], 

Under  certain  conditions,  it  can  be  argued  that  this  state  can  reasonably  be  modeled  as 
being  Markov.  In  our  current  work,  we  are  developing  these  justifications  and  future  work  will 
employ  this  new  state  definition  to  design  lower  complexity  sub-band  selection  and  other  cognitive 
algorithms  for  wideband  autonomous  cognitive  radios. 
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4  RESULTS  AND  DISCUSSION 


Robust  Wideband  Spectrum  Knowledge  Acquisition: 

Previously  we  had  compared  the  performance  of  our  compressive  sampling-based  robust 
spectrum  estimate  of  the  sub-band  sensed  signal  with  that  of  the  Gaussian-optimal  periodogram 
estimate.  In  order  to  evaluate  this  performance  in  a  simplified  simulation  scenario  we  considered 
a  sub-band  signal  y  of  length  N  =  128  composed  of  three  active  signals  xi,  xi  and  X3  located  at 
center  frequencies  (discrete)  of  5,  20  and  40,  respectively,  as  shown  in  the  top  row  of  Fig.  4.  The 
corresponding  signal  amplitudes  were  arbitrarily  chosen  to  be  8,  15  and  22.  Each  signal  has  a 
bandwidth  of  B  =  7  (in  discrete  frequency)  around  its  center  frequency.  For  random  compressive 
sampling,  the  sensing  matrix  was  drawn  according  to  a  normal  distribution. 


Original  signal  (noise  free),  FFT  with  128  samples 


0  20  40  60  80  100  120 


Discrete  frequencies 


Figure  4.  Power  spectral  densities  (PSD)  of  original  and  recovered  signals  by  robust 
compressive  sampling  with  varying  compression  ratios  33%,  50%,  67%,  84%  (top  to 

bottom). 

For  completeness,  Fig.  4  shows  the  reconstructed  sub-band  signal  by  solving  (5)  as  we  vary 
the  compression  ratio  of  the  number  of  samples  from  33%  to  84%  (with  respect  to  the  required 
Nyquist  rate)  in  the  presence  of  Gaussian-Laplacian  mixture  noise  with  6  =  0:9.  Clearly,  even  at 
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the  high  compression  ratio  of  67%,  the  performance  of  the  reconstructed  signal  seems  to  be 
reasonable  enough  for  signal  activity  detection  in  the  presence  of  noise.  Moreover,  as  we  show  in 
Fig.  5  below,  previously  we  also  observed  that  our  proposed  algorithm  with  a  reduced  number  of 
(compressed)  samples  can  provide  almost  comparable  or  better  performance  compared  to  the 
traditional  periodogram  estimate. 


Original  signal,  FFT  with  128  samples 


0  20  40  60  80  100  120 

Discrete  frequencies 


Figure  5.  Power  spectra  of  original  signal  and  those  recovered  by  the  robust  compressive 
sampling  algorithm  (with  89  samples)  and  by  the  Periodogram  (with  128  samples). 


During  the  last  performance  period,  we  continued  with  this  performance  analysis  to  obtain 
more  detailed  performance  characteristics  of  the  proposed  compressive  sensing-based  robust 
spectral  estimator.  Figure  6  shows  the  robustness  of  the  proposed  approach  even  with  fewer 
samples  (89  samples  in  this  case)  against  non-Gaussian  noise  as  the  amount  of  non-Gaussianity 
increases,  where  we  used  the  performance  metric  to  be  the  normalized  root  mean-squared  error 
defined  as  below  [10]: 


NRMSEpsd 


(9) 


Figure  7  shows  how  normalized  root  mean-squared  error  (NRMSE)  of  the  power  spectrum 
estimate  decreases  with  both  number  of  (compressive)  samples  used  as  well  as  with  increasing 
signal  to  noise  ratio  (SNR). 
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NRMSE  of  PSD 


NRMSE  between  |yf|2  and  |y*f|2  for  different  samples  of  CS  at  SNR=  -10  dB 


Figure  6.  NRMSEpsd  between  original  and  recovered  signals  by  robust  compressive 
sampling  (for  different  number  of  samples)  and  Periodogram  (128  samples). 


NRMSE  of  Robust  CS  for  different  SNR  values  with  e-  0.9 


(a)  (b) 

Figure  7.  Normalized  root  mean-squared  error  performance  of  the  compressive  sampling 
based  robust  spectrum  estimator,  (a)  The  NRMSEpsd  between  original  and  recovered 
signals  by  robust  compressive  sampling  for  different  number  of  samples,  (b)  The 
NRMSEpsd  between  original  and  recovered  signals  by  robust  compressive  sampling  at 
different  SNRs  with  different  number  of  samples. 
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In  our  simulation  set-up  of  a  wideband  spectrum  knowledge  acquisition  framework,  the 
wide  spectrum  of  interest  to  the  WACR  is  defined  as  composed  of  a  set  of  non-contiguous 
frequency  ranges  and  divided  in  to  a  set  of  Nb  sub-bands  [1,8,26].  The  model  allows  these  sub¬ 
bands  to  have  different  number  of  signals  with  different  properties  and  be  either  ON  or  OFF  at  any 
given  time.  The  receiver  processing  modules  perform  the  down-conversion,  analog-to-digital 
conversion  of  input/output  Q  channels  and  the  subsequent  cognitive  processing  in  the  digital 
domain,  as  proposed  in  [1].  As  a  baseline  comparison  scenario,  we  have  implemented  spectral 
activity  detection  and  feature  extraction  based  on  the  periodogram  approach.  This  simulation  was 
used  to  analyze  the  trade-off  in  detection  performance  and  feature  accuracy  (in  this  case 
bandwidth)  as  a  function  of  the  detection  threshold  and  the  smoothing  window  length. 


Figure  8.  Dependence  of  detection  probability  and  bandwidth  estimation  error  on 
smoothing  and  Neyman-Pearson  threshold  (SNR  -1  dB). 

Figure  8  shows  the  direct  periodogram  estimated  from  the  sensed  sub-band  signal  and  the 
smoothed  periodogram  estimate  (after  down-converting  to  the  baseband).  In  this  case,  there  are  3 
active  signals  in  this  sub-band  of  width  40MHz  (Note  that  the  universal  software  radio  peripheral 
(USRP)  2943R  SDR  board  from  National  Instrument  has  an  instantaneous  bandwidth  of  40MHz). 
Shown  also  on  Fig.  8  is  the  exact  Nayman-Pearson  (NP)  threshold  computed  from  theory  as  well 
as  two  other  possible  empirical  threshold  values.  Note  that,  the  NP  threshold  is  computed  to 
maximize  the  detection  probability  of  signals.  However,  it  is  observed  that  in  many  cases  of  our 
assumed  RF  environment  this  threshold  may  lead  to  larger  mean  squared  errors  in  the  estimated 
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bandwidth  of  the  detected  signal.  In  order  to  obtain  a  compromise  between  the  detection 
probability  and  the  bandwidth  estimation  accuracy,  it  has  been  observed  that  we  need  a  trade-off 
between  the  smoothing  window  length  and  the  detection  threshold. 

Table  1:  Bandwidth  estimation  error  of  smoothed  signal  with  8000  samples,  (run  time:  50 


trials). 

SNR  (dB  ) 

-5dB 

-ldB 

ldB 

5 

rv 

NP  threshold 

No  signal  detected 

13.5003 

1.6786 

SJ 

0.5  x  NP  threshold 

No  signal  detected 

0.0896 

0.0811 

g 

0.25  x  NP  threshold 

1.6724 

0.0144 

0.0712 

Table  1  shows  the  bandwidth  estimation  errors  achieved  by  using  different  threshold  values 
on  the  smoothed  periodogram  estimate.  The  results  are  averaged  over  50  trials.  In  each  trial  the 
sub-band  spectrum  is  estimated  using  8000  signal  samples.  The  corresponding  detection 
probabilities  for  the  case  of  SNR=-ldB  are  shown  in  Table  2.  As  can  be  seen  for  lower  threshold 
values  (0.5xNP  threshold  and  0.25xNP  threshold),  the  detection  performance  is  perfect  whereas 
the  theoretical  NP  threshold  results  in  a  detection  probability  of  only  0.32.  This  is  because  it  is 
optimal  for  the  non-smoothed  periodogram  and  not  for  the  smoothed  periodogram.  Table  1  shows 
that  these  lower  thresholds  also  lead  to  much  smaller  bandwidth  estimation  errors  compared  to  that 
with  the  exact  NP  threshold. 

Table  2:  Probability  of  detection  of  smoothed  signal  with  8000  samples,  (run  time:  50 

trials). 

SNR  (dB) 


-ldB 

3 

NP  threshold 

32% 

-s: 

0.5  x  NP  threshold 

100% 

0.25  x  NP  threshold 

100% 
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Reinforcement  Learning-based  Sub-band  Selection  for  Wideband  Spectrum  Sensing  in 

Cognitive  Radios: 

The  performance  of  our  proposed  replicated  Q-learning  algorithm  was  compared  against 
four  benchmarks.  First  is  called  the  upper-bound  performance.  It  was  obtained  by  assuming  that 
the  WACR  may  observe  the  exact  state  at  time  k  before  selecting  action  a [/c].  Second  is  the 
performance  of  the  optimal  sub-band  selection  policy  obtained  by  solving  the  Bellman-optimality 
equation  [  1 ,33].  This,  in  other  words,  is  the  optimal  performance  of  the  associated  Markov  decision 
process  (MDP)  problem.  Third,  we  used  a  Q-learning  algorithm  under  the  assumption  that  the 
states  are  completely  observable.  Fourth,  and  finally,  we  used  the  performance  of  a  random  sub¬ 
band  selection  scheme  in  which  all  sub-bands  are  selected  with  equal  probabilities. 

In  the  spectrum  model,  we  assumed  that  there  are  Nb  =  3  sub-bands.  The  total  number  of 
channels  in  the  spectrum  is  8  channels  in  which  the  second  sub-band  contains  2  channels  and  the 
remaining  6  channels  are  divided  equally  in  the  first  and  the  third  sub-bands  [26].  All  channels  are 
assumed  to  have  the  same  bandwidth.  In  addition,  the  dynamics  of  these  channels  are  assumed  to 
be  independent  of  each  other.  Each  simulation  was  carried  out  over  10,000  iterations.  We  observed 
that  about  1,500  iterations  were  needed  for  the  Q-table  to  be  considered  as  converged. 


(a)  (b) 

Figure  9.  Comparison  of  normalized  accumulated  reward  of  sub-band  selection  policies,  (a) 
Relatively  low  exploration  after  convergence  with  e  —  0.  01  (b)  Relatively  high  exploration 

after  convergence  with  e  —  0.  3. 
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Figure  9  compares  the  performance  of  the  replicated  Q-learning  with  the  other  four 
methods  mentioned  above  [26].  As  our  perfonnance  metric,  we  used  the  normalized  accumulated 
reward,  defined  as 

R^^rmUik])  (10) 

A  k=\ 

where  N  is  the  number  of  iterations.  Unless  noted  otherwise,  a  discount  factor  of  y  —  0.2  was 
used.  In  addition,  initially  we  allowed  a  high  exploration  rate  of  e  —  0.8  and  a  learning  rate  of  a  = 
0.4.  After  convergence,  we  reduced  the  learning  rate  and  the  exploration  rate  to  a  —  0.1  and  e 
=0.01,  respectively. 

As  can  be  seen  from  Fig.  9(a),  the  random  sub-band  selection  policy  can  only  achieve  about 
a  68%  of  that  of  the  optimal  policy.  As  one  would  expect,  the  perfonnance  of  both  Q-learning  and 
replicated  Q-leaming  lie  somewhere  between  the  optimal  and  random- action  policies.  It  can  be 
seen  from  Fig.  9(a)  that  Q-learning  converges  about  95%  of  the  perfonnance  achieved  by  the 
optimal  policy.  On  the  other  hand,  the  replicated  Q-learning  algorithm  achieves  about  84%  of  the 
performance  of  the  optimal  policy.  This  is  significant  in  three  ways:  First,  it  shows  that  the 
replicated  Q-learning  can  indeed  provide  noticeably  better  performance  than  simply  selecting 
random  sub-bands  for  sensing.  Second,  its  perfonnance  is  not  that  far  from  that  of  the  optimal  sub¬ 
band  selection  policy  that  requires  complete  state  observability.  Third,  and  final,  is  the  fact  that 
replicated  Q-learning  achieves  about  88%  of  the  performance  of  the  Q-leaming  which  is  a  better 
performance  upper-bound  for  comparison  [26]. 

Recall  that  the  choice  of  e  is  a  trade-off  between  the  exploration  and  exploitation.  Figure 
9(b)  shows  the  effect  of  using  a  relatively  larger  value  of  e  —  0.3  after  the  convergence  compared 
to  Fig.  9(a).  As  can  be  seen  from  Fig.  9(b),  performance  of  both  Q-leaming  and  replicated  Q- 
learning  has  degraded.  The  Q-learning  achieves  88%  of  the  optimal  performance,  while  replicated 
Q-leaming  achieves  only  about  79%  of  the  optimal  performance.  The  reason  is  that  the  higher 
exploration  rate  leads  to  too  much  exploration.  The  WACR  selects  random  actions  more  often  than 
in  Fig.  9(a)  as  opposed  to  exploiting  the  already  learned  better  actions. 
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5  CONCLUSIONS 


During  the  just  finished  performance  period,  we  developed  a  compressive  sampling-based 
robust  wideband  spectrum  knowledge  acquisition  framework  suitable  for  a  wideband  autonomous 
cognitive  radio  (WACR).  The  proposed  method  augments  the  Huber  cost  function  with  an 
additional  //-norm  penalty  term  in  order  to  find  a  sparse  spectrum  estimate  while  achieving 
robustness  against  possibly  non-Gaussian  noise.  We  observed  that  the  proposed  approach  can 
improve  the  wideband  spectrum  sensing  perfonnance  in  two  important  ways:  1)  the  required 
number  of  samples  can  be  reduced  and  2)  the  estimation  perfonnance  can  be  better  than  that  of  the 
conventional  periodogram.  Motivated  by  the  fact  that  spectrum  knowledge  acquisition  involves 
both  detection  and  identification  of  spectral  activity  and  that  the  signal  classification  can  more 
efficiently  be  achieved  based  on  features,  we  performed  extensive  experiments  to  gain  an 
understanding  of  the  heuristics  involved  in  properly  choosing  the  smoothing  window  and  the 
detection  threshold  to  achieve  a  trade-off  between  the  detection  probability  and  the  bandwidth 
estimation  errors. 

In  order  to  develop  a  sub-band  selection  mechanism  suitable  for  a  WACR,  we  modeled  the 
sub-band  selection  problem  as  a  partially  observable  Markov  decision  process  (POMDP),  in  which 
only  a  single  sub-band  can  be  sensed  at  any  given  time  out  of  all  available  sub-bands  in  the 
spectrum  of  interest.  This  model  was  then  used  to  develop  an  effective,  low-complexity  policy  to 
select  the  sub-bands  based  on  a  machine  learning  algorithm  called  the  replicated  Q-leaming. 
Simulation  results  showed  that  the  proposed  replicated  Q-learning  approach  can  provide  a 
substantial  improvement  over  the  random  sub-band  selection  policy.  We  also  showed  that  it  is 
better  in  practice  to  use  a  relatively  larger  exploration  rate  at  the  beginning  so  that  fast  learning 
can  be  achieved.  However,  after  the  convergence  the  exploration  rate  shall  be  reduced  accordingly 
to  reap  the  benefits  of  the  already  learned  actions. 

Finally,  we  observed  that  the  original  approach  proposed  in  [1]  to  define  the  spectrum  sub¬ 
band  state  may  lead  to  sub-band  selection  algorithms  with  unacceptably  high  computational 
complexities.  As  a  result,  we  have  been  investigating  new  mathematical  definitions  for  the  sub¬ 
band  state  that  may  specifically  be  applicable  for  certain  contexts  but  can  lead  to  decision  policies, 
learning  algorithms  and  cognitive  protocols  with  sufficiently  low  computational  complexities. 
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6  RECOMMENDATIONS 


Spectrum  is  a  battlefield.  As  more  nations  develop  advanced  communications  and  radar 
technologies,  mission  success  will  be  challenged  by  various  adverse  spectrum  conditions  including 
both  deliberate  as  well  as  inadvertent  interference.  Moreover,  encroachment  on  previously- 
allocated  spectrum  resources  by  the  commercial  and  unlicensed/unauthorized  users  can  only 
expected  to  grow  in  the  coming  years.  The  combination  of  these  traditional  as  well  as  evolving 
spectrum  challenges  requires  future  communications  technologies  to  be  intelligent,  self-aware  and 
spectrally  agile.  Unlike  cognitive  radios  treated  in  most  of  the  existing  literature,  the  WACRs 
proposed  in  [1]  and  pursued  in  this  project  are  radios  with  these  defining  characteristics  that  will 
allow  autonomous  radio  communications  over  non-contiguous  wide  spectrum  bands  in  the 
presence  of  adverse  conditions.  It  is  the  WACR  technology  that  has  the  potential  to  revolutionize 
the  future  communications  systems.  Thus  it  is  recommended  that  future  efforts  be  focused  on  fully 
developing  WACR  technology. 

The  compressive  sampling-based  robust  wideband  spectrum  sensing  approach  developed 
in  this  project  can  form  the  first  step  in  the  spectrum  knowledge  acquisition  framework  of  a 
WACR.  The  approach  can  help  handle  large  sub-band  bandwidths  with  realistic  hardware 
constraints  and  be  robust  against  non-Gaussian  noise  statistics.  With  the  experience  gained  from 
developing  our  reinforcement  learning  aided  sub-band  selection  algorithm  for  a  WACR,  we 
believe  that  the  new  state  definition  can  lead  to  algorithms  with  considerably  lower  computational 
complexities.  Future  efforts  must  thus  be  focused  on  reformulating  some  of  the  models  we  had 
developed  in  the  past  using  this  new  state  definition  and  developing  machine  learning-based 
cognitive  communications  algorithms  suitable  for  wideband  autonomous  cognitive  radios. 
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LIST  OF  SYMBOLS,  ABBREVIATIONS,  AND  ACRONYMS 


DFT 

Discrete  Fourier  Transform 

DSS 

Dynamic  Spectrum  Sharing 

EMI 

Electromagnetic  Interference 

MDP 

Markov  Decision  Process 

NRMSE 

Normalized  Root  Mean-Squared  Error 

NP 

Neyman-Pearson 

POMDP 

Partially  Observable  Markov  Decision  Process 

PSD 

Power  Spectral  Density 

RF 

Radio  frequency 

SDR 

Software-defined  Radio 

SNF 

Signal  to  noise  ratio 

USRP 

Universal  Software  Radio  Peripheral 

WACR 

Wideband  Autonomous  Cognitive  Radio 
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